Determining Ambiguity Classes for Part-of-Speech Tagging
نویسنده
چکیده
We examine how words group together in the lexicon, in terms of ambiguity classes, and use this information in a redefined tagset to improve POS tagging. In light of errors in the training data and a limited amount of annotated data, we investigate ways to define ambiguity classes for words which consider the lexicon as a whole and predict unknown uses of words. Fitting words to typical ambiguity classes is shown to provide more accurate ambiguity classes for words and to significantly improve tagging performance.
منابع مشابه
A Bayesian LDA-based model for semi-supervised part-of-speech tagging
We present a novel Bayesian model for semi-supervised part-of-speech tagging. Our model extends the Latent Dirichlet Allocation model and incorporates the intuition that words’ distributions over tags, p(t|w), are sparse. In addition we introduce a model for determining the set of possible tags of a word which captures important dependencies in the ambiguity classes of words. Our model outperfo...
متن کاملبرچسبگذاری ادات سخن زبان فارسی با استفاده از مدل شبکۀ فازی
Part of speech tagging (POS tagging) is an ongoing research in natural language processing (NLP) applications. The process of classifying words into their parts of speech and labeling them accordingly is known as part-of-speech tagging, POS-tagging, or simply tagging. Parts of speech are also known as word classes or lexical categories. The purpose of POS tagging is determining the grammatical ...
متن کاملPart-of-Speech Tagging with Two Sequential Transducers
The article presents a method of constructing and applying a cascade consisting of a leftand a right-sequential finite-state transducer, and , for part-of-speech disambiguation. In the process of POS tagging, every word is first assigned a unique ambiguity class that represents the set of alternative tags that this word can occur with. The sequence of the ambiguity classes of all words of one s...
متن کاملThe Role of Non-Ambiguous Words in Natural Language Disambiguation
This paper describes an unsupervised approach for natural language disambiguation, applicable to ambiguity problems where classes of equivalence can be defined over the set of words in a lexicon. Lexical knowledge is induced from non-ambiguous words via classes of equivalence, and enables the automatic generation of annotated corpora. The only requirements are a lexicon and a raw textual corpus...
متن کاملAddressing Ambiguity in Unsupervised Part-of-Speech Induction with Substitute Vectors
We study substitute vectors to solve the part-of-speech ambiguity problem in an unsupervised setting. Part-of-speech tagging is a crucial preliminary process in many natural language processing applications. Because many words in natural languages have more than one part-of-speech tag, resolving part-of-speech ambiguity is an important task. We claim that partof-speech ambiguity can be solved u...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007